Deutsch English Français Italiano |
<ccf8d2a0-6ee4-4896-8f82-e49791c66729n@googlegroups.com>> View for Bookmarking (what is this?) Look up another Usenet article |
X-Received: by 2002:a05:622a:1a09:b0:3e1:e1ae:9d5c with SMTP id f9-20020a05622a1a0900b003e1e1ae9d5cmr1789501qtb.11.1681069098195; Sun, 09 Apr 2023 12:38:18 -0700 (PDT) X-Received: by 2002:a05:620a:248d:b0:74a:28a8:2c7 with SMTP id i13-20020a05620a248d00b0074a28a802c7mr1773414qkn.11.1681069097966; Sun, 09 Apr 2023 12:38:17 -0700 (PDT) Path: not-for-mail Newsgroups: comp.lang.forth Date: Sun, 9 Apr 2023 12:38:17 -0700 (PDT) In-Reply-To: <2023Apr9.192051@mips.complang.tuwien.ac.at> Injection-Info: google-groups.googlegroups.com; posting-host=65.207.89.54; posting-account=I-_H_woAAAA9zzro6crtEpUAyIvzd19b NNTP-Posting-Host: 65.207.89.54 References: <fa6cc06e-bd15-4c1e-84f8-0049c4662f19n@googlegroups.com> <3b0ba976-a5e7-4d81-a9e3-5acaeda0a923n@googlegroups.com> <f2c60dd3-5e22-4646-9cc5-dc0c819618a8n@googlegroups.com> <a06cca56-081c-42fc-9978-232783790ad1n@googlegroups.com> <78b16959-3631-48bc-8c1d-378d31a98bdcn@googlegroups.com> <2023Apr2.101853@mips.complang.tuwien.ac.at> <7a872c6c-2c48-4fc1-812a-160ca375558dn@googlegroups.com> <2023Apr2.143625@mips.complang.tuwien.ac.at> <ec17a8fd-b59b-4e16-b8a7-2225c6a2a9f2n@googlegroups.com> <2023Apr9.192051@mips.complang.tuwien.ac.at> User-Agent: G2/1.0 MIME-Version: 1.0 Message-ID: <ccf8d2a0-6ee4-4896-8f82-e49791c66729n@googlegroups.com> Subject: Re: 8 Forth Cores for Real Time Control From: Lorem Ipsum <gnuarm.deletethisbit@gmail.com> Injection-Date: Sun, 09 Apr 2023 19:38:18 +0000 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Received-Bytes: 8805 On Sunday, April 9, 2023 at 1:46:46=E2=80=AFPM UTC-4, Anton Ertl wrote: > Lorem Ipsum <gnuarm.del...@gmail.com> writes:=20 > >On Sunday, April 2, 2023 at 9:03:48=3DE2=3D80=3DAFAM UTC-4, Anton Ertl w= rote:=20 > >> Yes. My wording was misleading. What I meant: If you want to=3D20=20 > >> implement a barrel processor with a stack architecture, you have to=3D= 20=20 > >> treat the stack in many respects like a register file, possibly=3D20= =20 > >> resulting in a pipeline like above.=3D20=20 > >=20 > >I'm still not following. I'm not sure what you have to do with the regis= te=3D=20 > >r file, other than to have N of them like all other logic. The stack can= b=3D=20 > >e implemented in block RAM.=20 >=20 > Like a register file.=20 In what way does this impact the pipeline??? You are talking, but not expl= aining.=20 > By contrast, with a single-threaded approach, you can use the ALU=20 > output latch or the left ALU input latch as the TOS, reducing the=20 > porting requirements or increasing the performance.=20 Sorry, I don't know what you mean. You are describing something that is in= your head, without explaining it. =20 The ALU does not require a register on the output. You can do that, but yo= u also need multiplexing to allow other sources to reach the TOS register. = You can try to use the ALU as your mux, but, in reality, that just moves t= he mux to the input of the ALU. For example, R> needs a data path from the= return stack to the data stack. That can be input to a mux feeding the TO= S register, or it can be input to a mux feeding an ALU input. It's a mux, = either way.=20 > >A small counter points to the stack being proc=3D=20 > >essed at that time. You can only perform one stack read and one write fo= r =3D=20 > >each processor per instruction. =3D20=20 >=20 > That means that an instruction like + would need two cycles if both=20 > operands come from the block RAM. By contrast, with a single-threaded=20 > stack processor you can use a single-ported SRAM block for the stack=20 > items below the TOS, and still perform + in one cycle.=20 I don't know what a single threaded anything is. I don't understand your u= sage.=20 The TOS can be a separate register from the block ram, OR you can use two p= orts on the block RAM. I prefer to use a TOS register, and use the two bloc= k ram ports for read and write, because the addresses are typically differe= nt. You read from address x or you write to address x+1. So the address c= ounter for the stack has an output from the register and an output from the= increment/decrement logic. =20 > >> By contrast, for a single-thread stack-based CPU, what is the=3D20=20 > >> forwarding bypass (i.e., an optimization) of a register machine is the= =3D20=20 > >> normal path for the TOS of a stack machine; but not for a barrel=3D20= =20 > >> processor with a stack architecture.=3D20=20 > >=20 > >I guess I simply don't know what you mean by "forwarding bypass". I foun= d =3D=20 > >this.=3D20=20 > >=20 > >https://en.wikipedia.org/wiki/Operand_forwarding=20 > >=20 > >But I don't follow that either. This has to do with the data of the two = in=3D=20 > >struction being related. In the barrel stack processor, each phase of th= e =3D > >processor is an independent instruction stream. > Yes, so you throw away the advantage that the stack architecture gives=20 > you:=20 Sorry, that is not remotely clear to me. Using a pipeline to turn a single= processor into multiple processors, uses the same logic in the same way, f= or multiple instruction streams, with no interference. Using pipelining to= speed up a single instruction stream results in extra logic being required= and limited speed up from pipeline stalls and flushes.=20 > For a register architecture, the barrel processor approach means that=20 > you don't need to implement the forwarding bypass.=20 Which is not needed for the stack processor. What is your point??? > For a sigle-threaded stack architecture, you don't need the data path=20 > of the TOS through the register file/SRAM block (well, not quite, you=20 > need to put the TOS in the register file when you perform an=20 > instruction that just pushes something, but the usual path is directly=20 > from the ALU output to the left ALU input). I discussed the=20 > advantages of that above. A barrel processor approach means that this=20 > advantage goes away or at least the whole thing becomes quite a bit=20 > more complex.=20 Sorry, I have no idea what you are talking about. Why are you talking abou= t TOS and register files??? Do you mean TOS and stack?=20 > >Every time the stack is adjusted, the CPU would s=3D=20 > >tall. =3D20=20 >=20 > Does not sound like a competent microarchitectural design to me.=20 Whatever. You have so butchered the quoting and this statement is hanging = in isolation, so I have no idea what the context is.=20 Can you reply without the garbage at the ends of lines? What is the =3D20 = thing? > >> The logic added in pipelining depends on what is pipelined (over in=3D= 20=20 > >> comp.arch Mitch Alsup has explained several times how expensive a=3D20= =20 > >> deeply pipelined multiplier is: at some design points it's cheaper to= =3D20=20 > >> have two multipliers with half the pipelining that are used in=3D20=20 > >> alternating cycles).=3D20=20 > >=20 > >If you are talking about adding logic for a pipeline, that is some optim= iza=3D=20 > >tion you are performing. It's not inherent in the pipelining itself. Pip= e=3D=20 > >lining only requires that the logic flow be broken into steps by registe= rs.=3D=20 >=20 > Yes, and these registers are additional logic that costs area. In the=20 > case of the deeply pipelined multiplier there would be so many bits=20 > that would have to be stored in registers for some pipeline stage that=20 > it's cheaper to have a second multiplier with half the pipelining=20 > depth. I have no idea what you are getting at. Of course pipeline registers use s= pace a chip. Duh! Do you have a point about this, or are you just looking= to debate the topic ad infinitum? =20 1) In FPGAs, the registers are typically free. They have a register with n= early every logic element. =20 2) When pipelining a stack processor, there is no need to pipeline the stac= k, unless you have an overly complex design that was overly slow to begin w= ith. A stack is a block of RAM with an address pointer. In a barrel proce= ssor, the address pointer is a small RAM as well, rotating through the phas= es as the pipeline progresses (typically implemented in distributed RAM). = An instruction like ADD pops the stack and writes the ALU result into the T= OS register. One operation, one clock cycle, no need for confusing anythin= g between phases. No pipelining of the stack.=20 Is there anything here, that is not clear?=20 --=20 Rick C. -++ Get 1,000 miles of free Supercharging -++ Tesla referral code - https://ts.la/richard11209